16 research outputs found
CIAGAN: Conditional Identity Anonymization Generative Adversarial Networks
The unprecedented increase in the usage of computer vision technology in
society goes hand in hand with an increased concern in data privacy. In many
real-world scenarios like people tracking or action recognition, it is
important to be able to process the data while taking careful consideration in
protecting people's identity. We propose and develop CIAGAN, a model for image
and video anonymization based on conditional generative adversarial networks.
Our model is able to remove the identifying characteristics of faces and bodies
while producing high-quality images and videos that can be used for any
computer vision task, such as detection or tracking. Unlike previous methods,
we have full control over the de-identification (anonymization) procedure,
ensuring both anonymization as well as diversity. We compare our method to
several baselines and achieve state-of-the-art results.Comment: CVPR 202
Deep watershed detector for music object recognition
Optical Music Recognition (OMR) is an important and challenging area within music information retrieval, the accurate detection of music symbols in digital images is a core functionality of any OMR pipeline. In this paper, we introduce a novel object detection method, based on synthetic energy maps and the watershed transform, called Deep Watershed Detector (DWD). Our method is specifically tailored to deal with high resolution images that contain a large number of very small objects and is therefore able to process full pages of written music. We present state-of-the-art detection results of common music symbols and show DWD’s ability to work with synthetic scores equally well as on handwritten music
Deep watershed detector for music object recognition
Optical Music Recognition (OMR) is an important and challenging area within music information retrieval, the accurate detection of music symbols in digital images is a core functionality of any OMR pipeline. In this paper, we introduce a novel object detection method, based on synthetic energy maps and the watershed transform, called Deep Watershed Detector (DWD). Our method is specifically tailored to deal with high resolution images that contain a large number of very small objects and is therefore able to process full pages of written music. We present state-of-the-art detection results of common music symbols and show DWD’s ability to work with synthetic scores equally well as on handwritten music
DeepScores and Deep Watershed Detection : current state and open issues
This paper gives an overview of our current Optical Music Recognition (OMR) research. We recently released the OMR data set DeepScores as well as the object detection method Deep Watershed Detector. We are currently taking some additional steps to improve both of them. Here we summarize current and future efforts, aimed at improving usefulness on real-world tasks and tackling extreme class imbalance
DeepScores : a dataset for segmentation, detection and classification of tiny objects
We present the DeepScores dataset with the goal of advancing the state-of-the-art in small object recognition by placing the question of object recognition in the context of scene understanding. DeepScores contains high quality images of musical scores, partitioned into 300,000 sheets of written music that contain symbols of different shapes and sizes. With close to a hundred million small objects, this makes our dataset not only unique, but also the largest public dataset. DeepScores comes with ground truth for object classification, detection and semantic segmentation. DeepScores thus poses a relevant challenge for computer vision in general, and optical music recognition (OMR) research in particular. We present a detailed statistical analysis of the dataset, comparing it with other computer vision datasets like PASCAL VOC, SUN, SVHN, ImageNet, MS-COCO, as well as with other OMR datasets. Finally, we provide baseline performances for object classification, intuition for the inherent difficulty that DeepScores poses to state-of-the-art object detectors like YOLO or R-CNN, and give pointers to future research based on this dataset
Data-Driven but Privacy-Conscious: Pedestrian Dataset De-identification via Full-Body Person Synthesis
The advent of data-driven technology solutions is accompanied by an
increasing concern with data privacy. This is of particular importance for
human-centered image recognition tasks, such as pedestrian detection,
re-identification, and tracking. To highlight the importance of privacy issues
and motivate future research, we motivate and introduce the Pedestrian Dataset
De-Identification (PDI) task. PDI evaluates the degree of de-identification and
downstream task training performance for a given de-identification method. As a
first baseline, we propose IncogniMOT, a two-stage full-body de-identification
pipeline based on image synthesis via generative adversarial networks. The
first stage replaces target pedestrians with synthetic identities. To improve
downstream task performance, we then apply stage two, which blends and adapts
the synthetic image parts into the data. To demonstrate the effectiveness of
IncogniMOT, we generate a fully de-identified version of the MOT17 pedestrian
tracking dataset and analyze its application as training data for pedestrian
re-identification, detection, and tracking models. Furthermore, we show how our
data is able to narrow the synthetic-to-real performance gap in a
privacy-conscious manner
The Group Loss++: A deeper look into group loss for deep metric learning
Deep metric learning has yielded impressive results in tasks such as clustering and image retrieval by leveraging neural networks to obtain highly discriminative feature embeddings, which can be used to group samples into different classes. Much research has been devoted to the design of smart loss functions or data mining strategies for training such networks. Most methods consider only pairs or triplets of samples within a mini-batch to compute the loss function, which is commonly based on the distance between embeddings. We propose Group Loss, a loss function based on a differentiable label-propagation method that enforces embedding similarity across all samples of a group while promoting, at the same time, low-density regions amongst data points belonging to different groups. Guided by the smoothness assumption that '`similar objects should belong to the same group'', the proposed loss trains the neural network for a classification task, enforcing a consistent labelling amongst samples within a class. We design a set of inference strategies tailored towards our algorithm, named Group Loss++ that further improve the results of our model. We show state-of-the-art results on clustering and image retrieval on four retrieval datasets, and present competitive results on two person re-identification datasets, providing a unified framework for retrieval and re-identification
The Group Loss++: A deeper look into group loss for deep metric learning
Deep metric learning has yielded impressive results in tasks such as
clustering and image retrieval by leveraging neural networks to obtain highly
discriminative feature embeddings, which can be used to group samples into
different classes. Much research has been devoted to the design of smart loss
functions or data mining strategies for training such networks. Most methods
consider only pairs or triplets of samples within a mini-batch to compute the
loss function, which is commonly based on the distance between embeddings. We
propose Group Loss, a loss function based on a differentiable label-propagation
method that enforces embedding similarity across all samples of a group while
promoting, at the same time, low-density regions amongst data points belonging
to different groups. Guided by the smoothness assumption that "similar objects
should belong to the same group", the proposed loss trains the neural network
for a classification task, enforcing a consistent labelling amongst samples
within a class. We design a set of inference strategies tailored towards our
algorithm, named Group Loss++ that further improve the results of our model. We
show state-of-the-art results on clustering and image retrieval on four
retrieval datasets, and present competitive results on two person
re-identification datasets, providing a unified framework for retrieval and
re-identification.Comment: Accepted to IEEE Transactions on Pattern Analysis and Machine
Intelligence (tPAMI), 2022. Includes supplementary materia